An Ensemble Machine Learning Model Using Gradient Boosting Identifies Patients with Disease Progression in Newly Diagnosed Multiple Myeloma

Williams, Louis S.; Khosravi, Bardia; Velimirovic, Marko; Khouri, Jack; Raza, Shahzad; Mazzoni, Sandra; Samaras, Christy J.; Awada, Hussein; Dima, Danai; Valent, Jason; Anwer, Faiz

doi:10.1182/blood-2023-188762

Louis S. Williams,

Louis S. Williams

1Department of Hematology and Medical Oncology, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH

Search for other works by this author on:

This Site

PubMed

Google Scholar

Bardia Khosravi,

Bardia Khosravi

2Department of Radiology, Mayo Clinic, Rochester, MN

Search for other works by this author on:

This Site

PubMed

Google Scholar

Marko Velimirovic,

Marko Velimirovic

3Department of Hematology and Medical Oncology, Cleveland Clinic Foundation, Cleveland, OH

Search for other works by this author on:

This Site

PubMed

Google Scholar

Jack Khouri,

Jack Khouri

4Department of Medicine, Cleveland Clinic, Lerner College of Medicine of Case Western Reserve University, Cleveland, OH

Search for other works by this author on:

This Site

PubMed

Google Scholar

Shahzad Raza,

Shahzad Raza

3Department of Hematology and Medical Oncology, Cleveland Clinic Foundation, Cleveland, OH

Search for other works by this author on:

This Site

PubMed

Google Scholar

Sandra Mazzoni,

Sandra Mazzoni

5Department of Hematology and Medical Oncology, Taussig Cancer Center, Cleveland Clinic, Cleveland, OH

Search for other works by this author on:

This Site

PubMed

Google Scholar

Christy J. Samaras,

Christy J. Samaras

1Department of Hematology and Medical Oncology, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH

Search for other works by this author on:

This Site

PubMed

Google Scholar

Hussein Awada,

Hussein Awada

6Department of Translational Hematology and Oncology Research, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH

Search for other works by this author on:

This Site

PubMed

Google Scholar

Danai Dima,

Danai Dima

6Department of Translational Hematology and Oncology Research, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH

Search for other works by this author on:

This Site

PubMed

Google Scholar

Jason Valent,

Jason Valent

1Department of Hematology and Medical Oncology, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH

Search for other works by this author on:

This Site

PubMed

Google Scholar

Faiz Anwer

1Department of Hematology and Medical Oncology, Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH

Search for other works by this author on:

This Site

PubMed

Google Scholar

Introduction:

In many cases, treatment decisions for multiple myeloma patients must be made in the absence of high quality randomized controlled clinical trials. As a result, many clinicians lean on prognostic models to drive treatment selection. The most widely used of these models incorporate a limited number of patient related factors that reflect tumor burden, cytogenetic features, or gene expression profiling. These prediction scores are derived from regression models that incorporate these variables at the time of diagnosis. They do not account for variables that change dynamically over time, do not explore the relationship between treatment selection and progression, and imperfectly predict survival outcomes. We designed a prognostic model based on an ensemble machine learning platform to predict progression in NDMM using Extreme Gradient Boosting Machine (XGBoost) combined with accelerated failure time modeling for survival analysis.

Methods:

We utilized a large retrospective data set containing treatment and response information for 1127 patients with newly diagnosed multiple myeloma (NDMM) treated at the Cleveland Clinic Foundation between 2000 and 2023. Following data preprocessing based on data completeness, 953 patient records were included in our training, testing, and cross validation datasets. For the purposes of our analysis, the initial dataset was randomly split into training (70%), testing (15%) and validation (15%) subsets at the patient level. We also defined the lower and upper bounds for each subset, which is critical due to the right-censored nature of survival data. Hyperparameters for XGBoost were optimized using Bayesian search, minimizing the negative log-likelihood. A random forest method was applied for the imputation of missing data (MissForest). We then trained the XGBoost model using GPU acceleration for enhanced computational efficiency. Finally, a log-rank test on predicted survival times was used to test the model's performance on patients with and without known progression.

Results:

Our preprocessed data set had 953 patients with a mean age at disease onset of 65 years and a slight male predominance (55%). Approximately 27% of patients harbored high cytogenetic risk disease and 3.5% of the cohort presented with extramedullary disease a diagnosis. At a median follow up of 35 months, 47% of patients had experienced disease progression or death. Induction therapy included an immunomodulatory drug in 40% of patients and proteasome inhibitors in 22%. Frontline autologous stem cell transplantation followed induction in 28% of patients. Median progression free survival in the overall data set was 44 months. These features were consistent across the randomly assigned training, testing, and validation cohorts.

Following data preprocessing, 34 independent clinical and genomic variables including selection of first line treatment were assessed as model inputs each available record. Under optimized parameters, the model was trained with a maximum tree depth = 7 and a learning rate ~ 0.22 for the desired output of progression free survival (PFS). We then queried PFS for both the training and validation sets and divided patients into those known to have progressed and those without a progression event. Survival analysis was undertaken using predicted PFS values for patients known to have progressed and those who remained progression free. For the validation data set, our model successfully discerned between progressed and non-progressed cases (log-rank test statistic = 22.07, p < 0.005) (Figure 1).

Discussion:

We present a machine learning approach based on regularized gradient boosting that accurately discerns between patients who experience progression and those who remain progression free at a median ~35 months of follow-up in a large retrospective data set in patients presenting with NDMM. Further elaboration of our model will allow for the incorporation of large amounts data to predict survival outcomes on the basis of dynamic variables such as depth of response and treatment selection. With the ability to parse outcomes among multiple myeloma patients at high resolution, future clinicians and clinical trialists may be able to overcome limitations in trial design and patient-level therapy selection for both newly diagnosed and relapsed patients.

Disclosures

Williams:Janssen: Consultancy; Abbvie: Consultancy; Bristol Meyers Squibb: Consultancy. Khouri:GPCR Therapeutics: Other: Payment or honoraria for lectures, presentations, speakers bureaus, manuscript writing or educational events; Janssen: Consultancy, Membership on an entity's Board of Directors or advisory committees, Other: Payment or honoraria for lectures, presentations, speakers bureaus, manuscript writing or educational events. Raza:Kite: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Incyte: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau. Valent:Alexion, AstraZeneca Rare Disease: Research Funding.

Figure 1

View large Download slide

This content is only available as a PDF.

2023

An Ensemble Machine Learning Model Using Gradient Boosting Identifies Patients with Disease Progression in Newly Diagnosed Multiple Myeloma

Disclosures

Contents

Data & Figures

Supplemental data

References

Cited By

Email alerts

ASH Publications

American Society of Hematology

An Ensemble Machine Learning Model Using Gradient Boosting Identifies Patients with Disease Progression in Newly Diagnosed Multiple Myeloma Free

Disclosures

Contents

Data & Figures

Supplemental data

References

Related

Related

Cited By

Email alerts

ASH Publications

American Society of Hematology

This Feature Is Available To Subscribers Only

An Ensemble Machine Learning Model Using Gradient Boosting Identifies Patients with Disease Progression in Newly Diagnosed Multiple Myeloma